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Abstract 

We study two clustering problems, Starforest Editing, the problem 
of adding and deleting edges to obtain a disjoint union of stars, and the 
generalization Bicluster Editing. We show that, in addition to being 
NP-hard, none of the problems can be solved in subexponential time unless 
the exponential time hypothesis fails. 

Misra, Panolan, and Saurabh (MFCS 2013) argue that introducing a bound 
on the number of connected components in the solution should not make 
the problem easier: In particular, they argue that the subexponential time 
algorithm for editing to a fixed number of clusters (p-CLUSTER Editing) 
by Fomin et al. (J. Comput. Syst. Sci., 80(7) 2014) is an exception rather 
than the rule. Here, p is a secondary parameter, bounding the number of 
components in the solution. 

However, upon bounding the number of stars or bicliques in the solution, 
we obtain algorithms which run in time 2 5 '^+0(n+m) for p-Starforest 
Editing and 2°( p '^ log ( pfe W + 0(n + m) for p-Bicluster Editing. We 
obtain a similar result for the more general case of I-Partite p-Cluster 
Editing. This is subexponential in k for a fixed number of clusters, since p 
is then considered a constant. 

Our results even out the number of multivariate subexponential time 
algorithms and give reasons to believe that this area warrants further 
study. 


1 Introduction 

Identifying clusters and biclusters has been a central motif in data mining 
research [21] and forms the cornerstone of algorithmic applications in e.g. biol¬ 
ogy [24] and expression data analysis [7]. Cai [6] showed that clustering—among 
many other graph modification problems of similar flavor—is solvable in fixed- 
parameter tractable time. Parallel to these general results, some progress was 
made in the area of graphs of topological nature: many problems are, when 
restricted to classes characterized by a finite set of forbidden minors, solvable 
in subexponential parameterized time , i.e. they admit algorithms with time 
complexity 2°^ ■ poly(n). 

The complexity class of problems admitting such an algorithm is called 
SUBEPT and was defined by Flum and Grohe in the seminal textbook on 
parameterized complexity [13]. They simultaneously noticed that most natural 
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problems did, in fact, not live in this complexity class: The classical N P-hardness 
reductions paired with the exponential time hypothesis of Impagliazzo, Paturi 
and Zane [19] is enough to show that no 2°^ ■ poly(n) algorithm exists. 

In this context, Jianer Chen posed the following open problem in the field of 
parameterized algorithms [5]: Are there examples of natural problems on graphs, 
that do not have such a topological constraint, and also have subexponential pa¬ 
rameterized running time? Alon, Lokshtanov and Saurabh [1] partially answered 
this question in the positive by providing a subexponential time algorithm for 
Feedback Arc Set on tournament graphs. However, tournament graphs form 
a rather atypical class of graphs 1 , so Chen’s question cannot be considered fully 
answered—are there problems which are in SUBEPT on general graphs? 

This is indeed the case. Fomin and Villanger [15] showed that Minimum 
Fill-In was solvable in time 2°(^ logfc ) +poly(n). Minimum Fill-In is the 
problem of completing a graph into a chordal graph, adding as few edges as 
possible. Following this, a line of research was established investigating whether 
more graph modification problems admit such algorithms. It proved to be a 
fruitful area; Since the result by Fomin and Villanger, we now know that several 
graph modification problems towards classes such as split graphs [16], threshold 
graphs [10], trivially perfect graphs [11], (proper) interval graphs [3, 4] and more 
admit subexponential time algorithms. 

While these classes are rather “simple”, they certainly are much more com¬ 
plex than simple cluster or bicluster graphs. Therefore, the problems Cluster 
Editing and Cluster Deletion were logical candidates for subexponential 
time algorithms. Surprisingly, we cannot expect that such algorithms exist. 
Komusiewicz and Uhlmann gave an elegant reduction proving that both parame¬ 
terized and exact subexponential time algorithms were not achievable, unless 
ETH fails [20]. On the other hand, the problem p-CLUSTER Editing, where 
the number of components in the target class is fixed to be at most p —rather 
surprisingly—does indeed admit a subexponential parameterized time algorithm; 
This was shown by Fomin et al. [14], who designed an algorithm solving this 
problem in time 2° < ''^' ) ■ poly(n). 

Misra, Panolan, and Saurabh [22] explicitly stated their surprise about this 
result: In their opinion, bounding the number of components in the target graph 
should in general not facilitate subexponential time algorithms (ibid.): “We show 
that this sub-exponential time algorithm for the fixed number of cliques is rather 
an exception than a rule. ” 

We show that the related problem Bicluster Editing and its generalization t- 
Partite p-Cluster Editing as well as the special case Starforest Editing 
also belong to this exceptional class of problems where a bound on the number 
of target components greatly improves their algorithmic tractability. Since Bl- 
CLUSTER Editing is an important tool in molecular biology and biological data 
analysis 2 and the necessary second parameter is not outlandish in these settings, 
we feel that this is a noteworthy insight. We complement these results with 
NP-completeness proofs for Bicluster Editing and APartite p-Cluster 

1 For instance, Dominating Set is W[2]-hard on tournament graphs, but not expected to 
be NP-hard. 

2 For more motivations for biclustering problems, we refer to the two surveys related to 
biological research, by Madeira and Oliveira [21], and by Tanay, Sharan and Shamir [24]. 
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Editing on subcubic graphs and further show that, unless ETH fails, no pa¬ 
rameterized or exact subexponential algorithm is possible without the secondary 
parameter. That a bound on the maximal degree does not contribute towards 
making these problems more tractable contrasts many other graph modification 
problems (like modifications towards split and threshold graphs [23]) which are 
polynomial time solvable in this setting. 

Previously, it was known Bicluster Editing in general is NP-complete [2], 
and Guo, Hiiffner, Komusiewicz, and Zhang [17] studied the problem from a 
parameterized point of view, giving a linear problem kernel with 6 k vertices, and 
an algorithm solving the problem in time 0(3.24 fc + m ). 

Our contribution. In this paper, we study both the very general GPartite 
p-CLUSTER Editing as well as editing to the aforementioned special cases. On 
the positive side, we show that 

• p-Starforest Editing is solvable in time 0(2 5v/j ^ + n + to), and 

• both p-BiCLUSTER Editing and the more general t-PARTlTE p-CLUSTER 

Editing are solvable in time + 0{n + to) facilitated by a 

kernel of size 0(ptk), where t = 2 in the case of p-BlCLUSTER Editing. 

In many cases, p is considered a constant, and in this case our kernel has size 
linear in k. We supplement these algorithms with hardness results; Specifically, 
we show that 

• assuming ETH, Starforest Editing and Bicluster Editing cannot 
be solved in time 2°^ • poly(n) and thus neither can CPartite Cluster 
Editing, and 

• p-STARFOREST Editing is W[l]-hard if parameterized by p alone. 

Organization of the paper. In Section 3 we give a subexponential time 
parameterized algorithm for the Starforest Editing problem when parameter¬ 
ized by the editing budget and the number of stars in the solution simultaneously. 

A necessary ingredient for our subexponential algorithms is a polynomial 
kernel. A kernel for Bicluster Editing exists already [17] and we provide 
one for the f-partite case in Section 4. In Section 5 we show that p-BlCLUSTER 
Editing is solvable in subexponential time in k\ We give a 2 °( p% ^ los b ,fe )) _|_ 
0{n + to) algorithm and generalize it to editing to t-partite p-cluster graphs. 
The parameter p is usually considered to be a fixed constant, hence the running 
time is truly subexponential, 2°( fc ) 0(n + m) in the editing budget k. However, 

for a more fine-grained complexity analysis and for lower bounds, we treat p as 
a parameter. 

In Section 6 we show that we cannot expect such an algorithm without an 
exponential dependency on p; The problem is not solvable in time 2°( k '>n 0 ^ 
unless ETH fails. Further, we show that Starforest Editing is W[l]-hard if 
parameterized by p alone, before we conclude in Section 7. 
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2 Preliminaries 


We consider only finite simple graphs G = (V, E) and we use n and m to denote 
the size of the vertex set and edge set, respectively. We denote by Nq{v ) the 
set of neighbors of v in G, and let deg G (r>) = A' G (t>)|. We omit subscripts when 
the graph in question is clear from context. We refer to the monograph by 
Diestel [9] for graph terminology and notation not defined here. For information 
on parameterized complexity, we refer to the textbook by Flunr and Grohe [13]. 
We consider an edge in E[G) to be a set of size two, i.e., e £ E[G) is of the 
form {u, r>} C V(G) with !i/», We denote by [E(G)[ the set of all size two 
subsets of G. When F C [1/(G)] 2 , we write GAF to denote G' = (V,EAF), 
where A is the symmetric difference , i.e., EAF = (E \ F) U (F \ E). When the 
graph is clear from context, we will refer to F simply as a set of edges rather 
than F C [V(G)] 2 . 

Let us fix the following terminology: A star graph is a tree of diameter at 
most two (a graph isomorphic to K\j for some t). The degree-one vertices are 
called leaves and the vertex of higher degree the center. A starforest is a forest 
whose connected components are stars or, equivalently, a graph that does not 
contain {K^tP^C^} as induced subgraphs. A biclique is a complete bipartite 
graph K a b for some a, b £ N, and a bicluster graph is a disjoint union of bicliques. 
A t-partite clique graph is a graph whose vertex set can be partitioned into at 
most t independent sets, all pairwise fully connected, and a f-partite cluster 
graph is a disjoint union of t-partite cliques. The problem of editing towards 
a starforest (resp. bicluster and t-partite cluster) is the algorithmic problem of 
adding and deleting as few edges as possible to convert a graph G to a starforest 
(resp. bicluster and t-partite cluster). 

Exponential time hypothesis. To show that there is no subexponential 
time algorithm for Starforest Editing we give a linear reduction from 3Sat, 
that is, a reduction which constructs an instance whose parameter is bounded 
linearly in the size of the input formula. The constructed instance will also have 
size bounded linearly in the size of the formula, and we use this to also rule 
out an exact sub exponential algorithm of the form 2 0 ( n + m ) • poly(n). Pipelining 
such a reduction with an assumed subexponential parameterized algorithm for 
the problem would give a subexponential algorithm for 3Sat, contradicting the 
complexity hypothesis of Impagliazzo, Paturi, and Zane [19]. Their Sparsification 
Lemma shows that, unless ETH fails, 3 Sat cannot be solved in time 2 0< - n+m \ 
where n and m here refer to the number of variables and the number of clauses, 
respectively. 


3 Editing to starforests in subexponential time 

A first natural step in handling modification problems related to bicluster graphs 
is modification towards the subclass of bicluster graphs called starforest. Recall 
that a graph is a starforest if it is a bicluster where every biclique has one side 
of size exactly one, or equivalently, every connected component is a star. 
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Starforest Editing 

Input: A graph G = (V,E) and a non-negative integer k. 

Parameter: k 

Question: Is there a set of at most k edges F such that GAF is a 

disjoint union of stars? 

The problem where we only allow to delete edges is referred to as Starforest 
Deletion. These two problems can easily be observed to be equivalent; Adding 
an edge to a forbidden induced subgraph will create one of the other forbidden 
subgraphs, or simply put, it never makes sense to add an edge. 

In Section 6 we show that this problem is N P-hard, and that it is not solvable 
in time 2°^n°^ unless the exponential time hypothesis fails. 

Multivariate analysis. Since no subexponential algorithm is possible under 
ETH, we introduce a secondary parameter by p which bounds the number of 
connected components in a solution graph. This has previously been done with 
success in the Cluster Editing problem [14]. Hence, we define the following 
multivariate variant of the above problem. 
p- Starforest Editing 

Input: A graph G = (V,E) and a non-negative integer k. 

Parameter: p , k 

Question: Is there a set F of edges of size at most k such that GAF 

is a disjoint union of exactly p stars? 


Observe that this problem is not the same as p- Starforest Deletion since 
we might need to merge stars to achieve the desired value p for the number of 
connected components. In Section 6 we show that the problem is W[l]-hard 
parameterized by p alone, and that we therefore need to parameterize on both p 
and k. 

Lemma 1. Let (G,k) be input to p-Starforest Editing. If ( G,k ) is a 
yes-instance, there can be at most p + 2k vertices with degree at least 2. 

Proof. Suppose H = GAF is a disjoint union of p stars with \F\ < k. Let C be 
the set of p centers. Now, V ( H) \ C is a set of leaves of which at most 2k can 
be incident to F in G. Since all other leaves must already have degree one in G, 
the claim follows. □ 

The following bound will be key to obtain the subexponential running time. 

Proposition 2 ([14]). If a and b are non-negative integers, then ( a+b ) < 2?^°^. 

Lemma 3. Given a graph G and a vertex set S , we can compute in linear time 
0(n + m) an optimal editing set F such that GAF is a starforest, when restricted 
to have S as the set of centers in the solutions. 

Proof. Observe that we need to delete every edge whose endpoints either lie both 
inside S or both outside of S. What remains is a bipartite graph with S being 
one side of the bipartition. To complete the editing, for every vertex v £ V \ S, 
with deg(n) > 1, we delete all but one edge, and for every isolated vertex, we 
arbitrarily attach it to some vertex of S. It is easy to see that this solution is 
optimal. □ 
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We now describe an algorithm which solves p-STARFOREST Editing in time 
0{2 5 ^ + n + m). 


The algorithm. Let (G, k) be an input instance for p-STARFOREST Editing. 
If the number of vertices of degree at least two is greater than p + 2k, we 
say no in accordance with Lemma 1. Otherwise we split the graph into G\ 
and G 2 as follows: Let X C V{G) be the collection of vertices contained in 
connected components of size one or two, i.e., G[X] is a collection of isolated 
vertices and edges. Let Gi = G[A'] and G 2 = G\V(G) \ X], Clearly, there are 
no edges going out of X in G. We will treat Gi,G 2 as (almost) independent 
subinstances by guessing the budgets k\ + k 2 = k and the number of components 
in their respective solutions Pi + P 2 = P- The only time we cannot treat them as 
independent instances is when pi or P 2 is zero; Let p* be the number of stars 
completely contained in Gi in an optimal solution. If both p* > 0, then there 
always exist an optimal solution that does not add any edge between Gi and G 2 . 

Solving (Gi,fei) with pi components: Assume Gi contains s isolated edges 
and t isolated vertices, with pi > 0. If |V(Gi)| < pi, we immediately say no, 
since we need exactly p\ connected components. Depending on the values of s 
and t, we execute the following operations as long as the budget k\ is positive. 
If s < p\ and s + t < pi, we have too few stars, and we arbitrarily delete edges 
to increase the number of connected components to pi. 

If s = 0 we turn the isolated vertices arbitrarily into pi stars. Otherwise, 
fix an arbitrary endpoint c of an isolated edge. Assume that s < pp. then we 
connect enough isolated vertices to c such that the number of stars is p\. Finally, 
if s > pi, we first dissolve s — p\ edges and continue as in the previous case. It 
is easy to check that the above solutions are optimal. 

Solving (G 2 , k 2 ) with P 2 components: By Lemma 1, the number of vertices 
of degree at least two is bounded by P 2 + 2k 2 • Every vertex of degree one in G 2 
is adjacent to a vertex of larger degree, thus it never makes sense to choose it as 
a center (its neighbor will always be cheaper). Hence, it suffices to enumerate 
every set S 2 of P 2 vertices of degree larger than one and test in linear time, 
as per Lemma 3, whether a solution inside the budget k 2 is possible. Using 
Proposition 2 we can bound the running time by 

^ • pk + 0(n + m) = 0(2 5v/p2fe2 + n + m). 

We are left with the cases where pi or P 2 are equal to zero: then the only possible 
solution is to remove all edges within Gi or G 2 , respectively, and connect all the 
resulting isolated vertices to an arbitrary center in the other instance. We either 
follow through with the operation, if within the respective budget, or deduce 
that the subinstance is not solvable. We conclude that the above algorithm will 
at some point guess the correct budgets for G\ and G 2 and thus find a solution 
of size at most k. The theorem follows. 

Theorem 4. p-Starforest Editing is solvable in time 0(2 5 '^ + n + m). 


P 2 + 2k 2 
P2 
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4 A polynomial kernel for t-partite ^-cluster edit¬ 
ing 

We show a simple 0(ktp) kernel for the t- Partite p-CLUSTER Editing problem— 
which will be the foundation of the subsequent subexponential algorithms—with 
a single rule, Rule 1, which can be exhaustively applied in time 0(n + m). The 
problem at hand is the following generalization of p-BlCLUSTER Editing: 

f-P artite p-Cluster Editing 

Input: A graph G = ( V,E ) and a non-negative integer k. 

Parameter: p, k 

Question: Is there a set F C [V] of edges of size at most k such 

that GAF is a disjoint union of exactly p complete f-partite 
graphs? 


For our rule, we say that a set A C V (G) is a non-isolate twin class if for 
every v and v' in X , Na{v) = Ng{v') 7 ^ 0. Note that this is by definition a 
false twin class, i.e., vv' (f E(G), or in other words, a non-isolate twin class is an 
independent set. 

Rule 1. If there is a non-isolate twin class X C V(G) of size at least 2k + 2, 
then delete all but 2k + 1 of them. 

Lemma 5. Rule 1 is sound and can be exhaustively applied in linear time. 

Proof. To reduce the number of connected components by one we need to add at 
least one edge. Hence, a yes-instance cannot contain more than p + k connected 
components. 

It is sufficient to observe that a non-isolated class of false twins X of size at 
least 2k + 1 will never be touched by a minimal solution; Let (G, k) be a yes 
instance with F a solution. Suppose A is a non-isolated class of false twins of 
size at least 2k + 1. At most 2k vertices are touched by X, and we claim that F', 
the set of edges of F not incident to any vertex of A is a solution. Let x £ A 
be a vertex not incident on F. This means that Nq{x) is exactly the entire 
complete f-partite component except its own part. But since f-partite p-clusters 
are closed under adding non-isolated false twins, we may add as many false twins 
to x in G as we want without changing the solution. It follows that we may 
assume that its false twins will not be touched by F and hence F' is a solution 
as well. 

The rule can be applied in linear time by first computing a modular decom¬ 
position of the input graph, which can be done in linear time [18], and marking 
all the vertices to be deleted. □ 

The following result is an immediate consequence of the above rule and its 
correctness. 

Theorem 6. The problem GPartite p-CLUSTER Editing admits a kernel with 
pt(2k + 1) + 2k = O(ptk) vertices. 

Proof. We now count the number of vertices we can have in a yes instance after 
the rule above has been applied. We claim that if G has more thanpt(2fc + l) + 2/c 
vertices, it is a no instance. 
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Let (G, k) be the reduced instance according to Rule 1 and let F be a solution 
of size at most k. At most 2k vertices can be touched by F, so the rest of the 
graph remains as it is, and is a disjoint union of at most p complete f-partite 
graphs, each of which has at most t non-isolate twin classes. It follows that in a 
yes instance, G has at most pt(2k + 1) + 2k = 0(ptk ) vertices. □ 

5 Editing to bicluster graphs in sub exponential 
time 

In this section we lift the result of Section 3 by showing that the following 
problem is solvable in time 2°( p '^ klog ( pk ^ + 0(n + to). Observe that we lose 
the subexponential dependence on p, however, contrary to the result of Misra 
et al. [ 22 ], for fixed (or small, relative to k) p, this still is truly subexponential 
parameterized by k. 

P-Bicluster Editing 

Input: A graph G = (V,E) and a non-negative integer k. 

Parameter: p, k 

Question: Is there a set F C [V] of edges of size at most k such 

that GAF is a disjoint union of p complete bipartite graphs? 

We denote a biclique of G as G = (A, B) and call the sets A, B the sides of G. 
Before describing the algorithm for the general problem, we show that the 
following simpler problem is solvable in linear time using a greedy algorithm: 
Annotated Bicluster Editing 

Input: A bipartite graph G = ( A,B,E ), a partition A = 

{Ai, A 2 ,..., A p } of A and a non-negative integer k. 

Question: Is there a set F C [ V] of edges of size at most k such 

that GAF is a disjoint union of p complete bipartite graphs 
with each one side in A? 

Lemma 7. Annotated Bicluster Editing is solvable in time 0(n + m). 

Proof. Let G = (A, B , E), A = {Ai ,..., A p }, k be an instance of Annotated 
Bicluster Editing. Consider a vertex v £ B and define cost,(w) to be the cost 
of placing v in Bi where Ci = ( Ai,Bi ) is the *th biclique of the solution, i.e., 

costi(v) = |Ai| - 2 deg A .(r;) + deg(u), 

where deg^u) = | N(v) D A* |. We prove the following claim which implies that 
we can greedily assign each vertex v £ B to a biclique of minimum cost. 

Claim 8 . An optimal solution will always have v £ B in a biclique Ci = ( Ai,Bi) 
which minimizes cost^i;). 

Suppose that coster;) is minimal but v is placed by a solution F in a biclique Cj = 
(Aj,Bj) with cost,-(w) > costi(u). Deleting from F all edges Ej between v 
and Aj and adding all edges Ei between v and Ai creates a new solution 
F' = (F\ Ej) U Ei. Since cost,(u) > costj(u), we have that |F| > |F'| hence F 
is not optimal. This concludes the proof of the claim and the lemma. □ 


5.1 Subexponential time algorithm 

We now show that the problem p-BlCLUSTER Editing is solvable in subexpo¬ 
nential time by using the kernel from Theorem 6, guessing the annotated sets 
and applying the polynomial time algorithm for the annotated version of the 
problem. The important ingredient will be cheap vertices, by which we mean 
vertices that are known to receive very few edits. Intuitively, a cheap vertex is a 
“pin” that in sub exponential time reveals for us its neighborhood in the solution, 
and thus can be leveraged to uncover parts of said solution. 

We adopt the following notation and vocabulary. For an instance (G, k) of 
p-BlCLUSTER Editing, and a solution F, we call H = GAF the target graph. 
A vertex v is called cheap with respect to F if it receives at most y/k edits. 
Observe that any set X of size larger than 2 y/k has a cheap vertex. We call such 
a set large and all sets that contain at most 2 y/k vertices small. We will further 
classify the bicliques in the target graph into two different classes: A biclique is 
small if its vertex set is small and large otherwise. 

The algorithm now works as follows. Given an input instance (G, k) of 
p-BiCLUSTER Editing, we try all combinations of p s +pe = p, with the intended 
meaning that p s is the number of small bicliques and pe is the number of large 
bicliques in the target graph. 

Handling small bicliques. We enumerate a set of p s sets A s C 2 l with 
the property that they are pairwise disjoint, and each of size at most 2 y/k. 
Furthermore, G[[J A s ] contains at most k edges. Delete all edges in A s and 
reduce the budget accordingly. These are going to be all the left sides in small 
bicliques. This enumeration takes time 

(2 Vk) p ° < (2\/fc) p (^^") P = 2 0(pv/Slog(pfe)) . 

Handling large bicliques. The large bicliques have the following nice prop¬ 
erty. Since the vertex set of each such biclique is large, every biclique con¬ 
tains a cheap vertex. We guess a set Be of size pe- For the biclique Cj, 
the vertex m of Be will be a cheap vertex in Bi. Now, we enumerate all 
combinations of pe sets N = (N\, N 2 ,..., N pe ), each of size at most 2 y/k 
which will be the edited neighborhood of each cheap vertex, and we conclude 
that Ai = Njj(vi ) = Na(vi)ANi. The enumeration of this asymptotically takes 
time 



2 0 (pVk log ( pk )) 


Putting things together. With the above two steps, in time 2°^ p ' /klo ^ pk '>'> 
we obtained all the left sides A, partitioned into A s and Ae- Using this informa¬ 
tion, we can in polynomial time compute whether the Annotated Bicluster 
Editing instance (G,k,A) is a yes-instance. If so, we conclude yes, otherwise, 
we backtrack. 

Theorem 9. p-Bicluster Editing is solvable in time 2°^ p ^ klog ^ pk ^+0(n+m). 
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Proof. We now show that the algorithm described above correctly decides p- 
Bicluster Editing given an instance ( G , k). Suppose that the algorithm above 
concludes that (G, k) is a yes instance. The only time it outputs yes, is when 
Annotated Bicluster Editing for a given set A and a given budget k' outputs 
yes. Since this budget is the leftover budget from making A an independent set, 
it is clear that any Annotated Bicluster Editing solution of size at most k! 
gives a yes instance for p-BlCLUSTER Editing. 

Suppose now for the other direction that (G, k) is a yes instance for p- 
Bicluster Editing and let F be a solution. Consider the left sides A\, ..., A p 
of GAF with the restriction that the smaller of the two sides in Gj is named A,,. 
First we observe that during our subexponential time enumeration of sets, all 
the AiS that are of size at most 2 \/k will be enumerated in one of the branches 
where p s is set to the number of small bicliques. Furthermore, if Aj is large, 
then both are large, and then, for each of the large bicliques, there is a branch 
where we selected exactly one cheap vertex for each of the largest sides. Given 
these cheap vertices, there is a branch where we guess exactly the edits affecting 
each of the cheap vertices, hence we can conclude that in some branch, we know 
the entire partition A. From Lemma 7, we can conclude that the algorithm 
described above concludes correctly that we are dealing with a yes-instance. □ 

5.2 The t-partite case 

We can in fact obtain similar (we treat t here as a constant so the results are up 
to some constant factors in the exponents) results for the more general case of 
t-PARTiTE p-CLUSTER Editing. Again we need the polynomial kernel described 
in Theorem 6. The only difference now to the bicluster case is that we define a 
cluster to be small if every side is small. In this case, we can enumerate 
sets, which will form the small clusters. 

In the other case a cluster G = (Ai, A 2 , ..., A t ) is divided into A±, A 2 , ..., A ts 
small sides and A ia+1 , A ta+2 ,... ,A t large sides. For this case, we guess all the 
small sides and for each of the large sides we guess a cheap vertex. Guessing the 
neighborhoods N ts +i, N ts + 2 , ■ ■ ■, At for the cheap vertices Vt a +i,Vt s + 2 , ■ ■ ■ ,Vt 
gives us complete information on G; To compute what Aj is, if j > t s , we 
simply take the intersection fit t-t-j -W an d remove Ui<t B We arrive at 
the following lemma whose proof is directly analogous to that of Theorem 9. 

Theorem 10. The problem ^-Partite p-Cluster Editing is solvable in 
subexponential time 2°b ,v/ ^ 1 °s(p fe )) + Q(n + to). 


6 Lower bounds 

We show that (a) Starforest Editing is NP-hard and that we cannot expect 
a subexponential algorithm unless the ETH fails; and (b) that p-STARFOREST 
Editing is W[l]-hard parameterized only by p. 

6.1 Starforest editing 

In the following we describe a linear reduction from 3 Sat to Starforest 
Editing. Furthermore, the instance we reduce to has maximal degree three, 
thus not only showing that Starforest Editing is NP-hard on graphs of 
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Figure 1: Reduction from 3Sat to Starforest Editing on subcubic graphs. 


bounded degree, but also not solvable in subexponential time on subcubic 
graphs. 

Theorem 11. The problem Starforest Editing is NP -complete and, as¬ 
suming ETH, does not admit a subexponential parameterized algorithm when 
parameterized by the solution size k, i.e., it cannot be solved in time 0*( 2 °( fc )), 
nor in exact exponential time 0*(2°( n+m ^), even when restricted to subcubic 
graphs. 

To prove the theorem above we will reduce from 3Sat. But to obtain the result, 
it is crucial that in our reduction, both the parameter k, and the size of the 
instance G are bounded in linearly in n and m. Such results have been shown 
earlier, in particular by Komusiewicz and Uhlmann for Cluster Editing [20] 
and Drange and Pilipczuk for Trivially Perfect Editing [12]. Thus we 
resort to similar reductions as used there, however, the reductions here are 
tweaked to work for the problem at hand. We also achieve lower bounds for 
subcubic graphs. See Figures la and lb for figures of the gadgets. 

Variable gadget. Let ip be an input instance of 3Sat, and denote its variable 
set and clause set as V{<p) and C(<p), respectively. We construct for x £ V{<p) 
a graph G x = Ce Pa . where p x is the number of clauses in p which x appears 
in. The vertices of G x are labeled, consecutively, Tf , Uf , Af, Bf , Cf , Df for 
i £ [0 ,p x - 1 [. 

There are exactly three ways of deleting G x into a starforest using at most 
k x = 6 p x edges. Clearly a collection of P 3 S is a starforest and is our target graph. 
We will define the T-deletion for G x as the deletion set Sf = {CfDf,±fAf \ 
i < p x — 1} and the _L-deletion for G x as the deletion set Sf_ = {Af Bf , Df Tf +1 j 
i < Px ~ 1}, taking the i + 1 in the index of Tf, 1 modulo p x . In other words, in 
the gadget G x , we are keeping the edges 

• Df_{Tf±Jf,A!fBfCf, when x is set to true, and 
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• Tf_Lf Af,B?CfDf, when cc is set to false. 

Observe that when x is set to true, we will have paths on three vertices, where Tf 
is the middle vertex, and if x is set to false, we will have paths on three vertices 
with Tf being the middle vertex. Later, we will see that if x satisfies a clause c, 
the zth clause x appears in, then either Tf or Tf will be the middle vertex of a 
claw, depending on whether x appears positively or negatively in c. 

Observation 12. In an optimal edge edit of a cycle of length divisible by 6, no 
edge is added and exactly every third consecutive edge is deleted. 

Clause gadget. A clause gadget simply consists of one vertex, i.e., for a 
clause c £ C(ip), we construct the vertex v c . This vertex will be connected 
to G x , G y and G Zl for x , y, z being its variables, in appropriate places, depending 
on whether or not the variable occurs negated in c. In fact, it will be connected 
to Tf if c is the ith clause x appears in, and x appears positively in c, and it 
is connected to Tf if c is the ith clause x appears in, and x appears negatively 
in c. 

Let k v = 2\C\ + 2 Px = 2|C| + 3 • 2\C\ = 8|C| be the budget for a formula ip. 
We now observe that the budget is tight. 

Lemma 13. The graph G v has no starforest editing set of size less than k v , 
and if the editing set has size k v it contains only deletions. 

Proof. It is straightforward to verify that for each induced variable gadget G x we 
need at least 2 p x edges. Since every clause contains three variables, we have |C|/3 
such gadgets, and their necessary budget sum up to exactly J2x 2 Px • 3 = 6|C|. 

Since no two consecutive edges in G x will be deleted, by the previous obser¬ 
vation, we have that for each clause, after deleting edges in the variable gadgets, 
we will have an induced subdivided claw with the clause vertex as its center, 
and this graph needs at least two edits to become a star forest. This can be 
verified by observing that we have three induced P5S, and at most two of them 
can be removed by one edge edit. 

From the above analysis, we can conclude that G v needs at least 6\C\ ■ 2\C\ = 
8|C| = k v edits to become a starforest graph. □ 

We now continue to the main lemma, from which Theorem 11 follows. 

Lemma 14. A 3Sat instance ip is satisfiable if and only if {G v ,kyf) is a yes 
instance for Starforest Editing. 

Proof. Suppose tp was satisfiable and let a: V(<p) —> {T,T} be a satisfying 
assignment. We show that G — F for F defined below is a starforest graph and 
that \F\ < k, f (since the budget is tight, we have equality). For x £ V(<p) we 
define F x to be the following set of edges: 

. F x = {Cf Df , Tf Af | * <p x - 1 }, if a(x) = T. 

. F x = {AfBf, Df Tf +1 | * < Vx - 1}, if a(x) = T. 

Finally, for a clause c £ C(<p), let x c be a variable satisfying c. Define F c to be 
the two edges not incident to x c . 

We now show that F = Lbey F x U U< gC ^ c our s °l u ti° n - It should at this 

point be clear that |F| < k v . Since G x — F x is a collection of P3S, we only need 
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to verify that no clause gadget c is in an obstruction. Let c be an arbitrary clause 
gadget and let x c be the variable that is still incident to c. Clearly, since c is of 
degree 1, it has to be in an obstruction with x c . However, since x c satisfies c, 
and (for the moment) assuming that x c appears positively in c, a(x) = T and 
from G x , we deleted CfDf and -LfAf for every i. Since c connects to some 
vertex Tf , the connected component containing c is a claw centered in Tf with 
leaves c, Df and _L* . Hence c cannot be in an obstruction. The case when x c 
appears negatively is symmetric. This concludes the forward direction of the 
proof. 

For the reverse direction, suppose (G v , k v ) is a yes-instance for Starforest 
Editing and let F be a solution. Since the budget is tight, by the above lemma 
and observation, we know that F contains only deletions. There are unique 
ways of deleting all the G x s for the variable gadgets, so construct an assignment 
for the variables of ip, otF'- V(</s) — > {T,_L} by letting cxf(x) = T if for some 
i, the edge -LfAf is deleted, and let o.f{x) = _L otherwise. We claim that cxf 
is a satisfying assignment. Suppose that a clause c is not satisfied by any of 
its variables, and consider x c , the variable c is still adjacent to. We know it 
must be adjacent to at least one vertex since the budget is tight (not all three 
edges were deleted). Suppose x c appeared positively in c (thus the vertex for 
c is adjacent to some Tf). Since G — F is a starforest (recall that F can only 
contain deletions in the given budget), we know that in the subgraph G x to 
which x c belongs, we must have deleted the edge -LfAf, for otherwise, since 
every third edge is deleted, the edges Df _ 1 Tf and Cf_ 1 Df would remain and 
form an induced P 4 , contradicting the assumption that G — F was a starforest 
graph. But since -LfAf was deleted, by the construction of op, we set x to true, 
so x indeed satisfies c contradicting the initial assumption. The case where x c 
appears negatively in c is symmetric. □ 

Observing that the maximum degree of G v is three—the clause vertices have 
exactly degree three, and the variable gadgets are cycles with some vertices 
connected to at most one clause vertex—this concludes the proof of Theorem 11. 
From the discussions above, the following result is an immediate consequence: 

Corollary 15. The problem Starforest Deletion is NP -complete and not 
solvable in subexponential time under ETH, even on subcubic graphs. 

Before going into parameterized lower bounds of Starforest Editing, we 
show that the exact same reduction above simultaneously proves similar results 
for the bicluster case. We note that the NP-hardness was shown by Amit [2], 
but their reduction suffers a quadratic blowup and is therefore not suitable for 
showing subexponential lower bounds. 

Corollary 16. The problems Bicluster Editing and Bicluster Deletion 
are NP -complete and not solvable in subexponential time under ETH, even on 
subcubic graphs. 

Proof. We show that every optimum solution of (G v ,k v ) for the above con¬ 
structed G v and k v will yield a starforest and hence the corollary follows from 
the above result. We first show that Observation 12 also holds for the Bicluster 
Editing case, that is, for budget k x , there is a unique (up to rotation) solution 
which consists of deleting every third edge. First, we observe that deleting every 
third edge indeed is a solution as starforests are a subclass of biclusters. Second, 
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we can pack 3 p x paths of length four such that each pair of P 4 S share at most 
one edge, and such that any edit can eliminate at most two obstructions. Hence 
we need at least 2 • 3p x = k x edges to eliminate all the P 4 S. Since the budget is 
tight for G x , we now show that we still need at least two edges to eliminate G c 
Consider a clause-gadget. Since we have the same situation as above, i.e., it 
contains three induced P5S, we observe that at least two edits inside the gadget 
are necessary. Suppose that one of the edits is an edge addition (needed to make 
a biclique that is not a star), then we must use that edge to construct a C 4 . But 
this edit leaves one induced P5 which cannot be resolved by the remaining edit. 

By combining the arguments for G x and G c , we conclude that G v ,k v is a 
yes instance if and only if <p is satisfiable and furthermore that the solution will 
only delete edges, thus yielding a starforest. □ 

6.2 W[l]-hardness parameterized by p 

In this section we show that the parameterization by k is necessary, even for the 
case of p-S tarforest Editing. That is, we show that when we parameterize 
by p alone, the problem becomes W[l]-hard, and we can thus not expect any 
algorithms of the form /(p) ■ poly(n) for any function / solving p-STARFOREST 
Editing. We reduce from the problem Multicolored Regular Independent 
Set. An instance of this problem consists of a regular graph colored into p color 
classes, each color class inducing a complete graph, and we are asked to find an 
independent set of size p. 

Proposition 17 ([ 8 , Corollary 14.23]). The problem Multicolored Regular 
Independent Set is \N[l]-complete. 

Since each color class is complete, any independent set will be of size at most p 
and any independent set of size p is maximum. The reduction is direct; In fact 
we have that given a budget k = (n — p){d — 1), where d is the regularity degree, 
the following direct translation between the two problems holds: 

Lemma 18. Let G be a d-regular graph on n vertices, p < n and k = (n — p)(d — 
1). Then ( G,p ) is a yes instance for Multicolored Regular Independent 
Set if and only if (G, k) is a yes instance for p-STARFOREST Editing. 

Proof. In the forwards direction, suppose S is an independent set of size p in G. 
Then, since S is maximal, every vertex in G — S' is adjacent to S. For every 
vertex v S, delete d — 1 edges, but keep one connected to a vertex in S. Since 
there are n — p vertices outside S, and since S is an independent set, this is 
exactly all the edges we need keep and we obtain a starforest editing with exactly 
budget k. 

For the reverse direction, let us assume that G does not contain an indepen¬ 
dent set of size p. Hence, any set of p centers contains at least one edge; the 
total budget needed to edit to a starforest is then at least (n — p)[d — 1) + 1 > k 
and hence the answer for (G, k) is no, as well. □ 

Combining Proposition 17 with Lemma 18 yields the following result: 

Theorem 19. p-Starforest Editing is \N[l]-hard when parameterized by p 
alone. 
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7 Conclusion 


We presented subexponential time algorithms for editing problems towards 
bicluster graphs, and more generally, f-partite cluster graphs when the number 
of connected components in the target graph is bounded. We supplemented 
these findings with lower bounds, showing that this dual parameterization is 
indeed necessary. 

As an interesting open problem, we pose the question of whether f-PARTlTE 
p-CLUSTER Editing can be solved in time 2°(\/pfc) rl O( 1 ), j e ^ j n subexponential 
time with respect to both parameters. It is known that Bicluster Editing 
admits a linear kernel, but when introducing the secondary parameter, we only 
obtain a kernel whose size is bounded by the product of both parameters; Recall 
that we got a tp(2k + 1) + 2 k kernel, which in the bicluster case is p(Ak + 2) + 2k. 
Does Bicluster Editing admit a truly linear kernel, i.e., a kernel with 0(jp + k) 
vertices? 
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